The rapidly expanding corpus of medical research literature presents major challenges in the understanding of\nprevious work, the extraction of maximum information from collected data, and the identification of promising\nresearch directions. We present a case for the use of advanced machine learning techniques as an aide in this task and\nintroduce a novel methodology that is shown to be capable of extracting meaningful information from large\nlongitudinal corpora and of tracking complex temporal changes within it. Our framework is based on (i) the\ndiscretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model,\nand (iii) a temporal similarity graph which allows for the modelling of complex topic changes. More specifically, this is\nthe first work that discusses and distinguishes between two groups of particularly challenging topic evolution\nphenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely\nrecognized emergence and disappearance and gradual evolution. The proposed framework is evaluated on a public\nmedical literature corpus
Loading....